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Abstract 


Introduction: Acute myeloid leukemia (AML) accounts for a fifth of childhood leukemia. Although survival rates for AML have 
greatly improved over the past few decades, they vary depending on demographic and AML type factors. We predict the five- 
year survival among pediatric AML patients using machine learning algorithms and deploy the best performing algorithm as 
an online survival prediction tool. 

Methods: Pediatric patients (0 to 14 years) with microscopically confirmed AML were extracted from the Surveillance 
Epidemiology and End Results (SEER) database (2000-2011) and randomly split into training and test datasets (80/20 ratio). 
Four machine learning algorithms (logistic regression, support vector machine, gradient boosting, and K nearest neighbor) 
were trained on features to predict five-year survival. Performances of the algorithms were compared and the best performing 
algorithm was deployed as an online prediction tool. 

Results: A total of 1,477 patients met our inclusion criteria. The gradient boosting algorithm was the best performer in terms 
of discrimination and predictive ability. It was deployed as the online survival prediction tool named OSPAM-C (https://ashis- 
das.shinyapps.io/ospam/). 

Conclusions: Our study provides a framework for the development and deployment of an online survival prediction tool for 
pediatric patients with AML. While external validation is needed, our survival prediction tool presents an opportunity to reach 
informed clinical decision-making for AML patients. 
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accounts for a fifth of childhood leukemia [1,2]. The overall 
survival of children due to AML has improved in the recent 
decades due to advancements in therapy and it is currently 
around 70% [3-5]. However, survival rates vary depending 
on demographic and AML type factors [6-9]. Therefore, 
it is essential to understand the prognostic factors for 
AML outcomes for effective planning of treatment and 
rehabilitation modalities. While there have been few studies 
translating the prognostic factors to predictive models on 
AML, they have focused on adult patients and none have used 
machine learning specifically for predicting pediatric patient 
survival [10,11]. 


Machine learning consists of a group of artificial 
intelligence techniques, where the algorithms learn the 
patterns in the data without being explicitly programmed 
to carry out specific applications. Learning from a set of 
data (training data), machine learning algorithms apply a 
predictive model to unseen data (test data) [12]. Utilizing the 
already available data from hospitals and medical databases, 
machine learning has the potential to diagnose health 
conditions, predict appropriate treatment methods and 
patient survival to improve overall quality of life. There have 
been several applications of machine learning in healthcare, 
such as predicting diseases, health events and drug response, 
survival prediction, clustering of patients based on risk 
classification, analyzing genetics data and medical imaging 
[13-17]. In the field of cancer research, a few studies have 
utilized machine learning for predicting cancer survival from 
hospital records and registries [18-23]. The Surveillance 
Epidemiology and End Result (SEER) database is the largest 
publicly available source of cancer statistics in the United 
States and it includes approximately 28% of the population 
[24]. Though several studies have applied machine learning 
on predicting patient survival on various cancers from SEER 
database, none have applied it on AML for pediatric patients 
[20-22]. 


Our study had the two following objectives - (1) predict 
the five-year survival among pediatric (0 to 14 years) AML 
patients using machine learning algorithms, and (2) deploy 
the best performing algorithm as a web application for future 
validation and clinical use. 


Methods 
Patients 


Patients for this study were selected from the Surveillance 
Epidemiology and End Result (SEER) database (1975-2016) 
[25]. The standard for case completeness for the SEER 
database is 98% and all patients were followed up for 10 
years after routine treatment until death or loss to follow- 
up [26]. The database includes patient details from 1975 


Das AK, et al. Machine Learning to Predict 5-Year Survival among Pediatric Acute Myeloid Leukemia 


Journal of Quality in Health care & Economics 


through 2016 and reports their demographic background, 
cancer characteristics, and survival. The available variables 
on AML were age, sex, race, marital status, AML histologic 
subtype, AML grade, SEER registry details (name, state and 
county), year of diagnosis, and survival in months. 


Our inclusion criteria for this study were microscopically 
confirmed AML for patients aged 14 or younger. We 
excluded patients without microscopically confirmed AML, 
with unknown survival time and those with their years 
of diagnosis before 2000. So as to have adequate follow 
up period after the diagnosis, we considered the patients 
diagnosed between 2000 and 2011 as our sample. A total 
of 76,382 AML patients were diagnosed with AML between 
1975 and 2016 across all age groups. After excluding patients 
that did not meet our inclusion criteria, 1,477 pediatric AML 
patients were included in our study. 


Outcome Variable 


Our outcome variable was survival of five years or more 
among AML patients. In the SEER database, survival is a 
continuous variable with units in months. So, we created 
a binary variable where any patient with a survival of 60 
months or more was coded “yes”, or otherwise “no”. 


Predictors 


We considered individual patient level demographic 
and disease variables as predictors. Demographic predictors 
were sex, age (years at diagnosis), and race. There were six 


” 


races - “Hispanic”, “non-Hispanic American Indian/Alaska 
native”, “non-Hispanic Asian or Pacific Islander”, “non- 
Hispanic black”, “non-Hispanic white” and “non-Hispanic 


unknown”. 


Disease variables that were available in the database 
were AML sub-type and grade. In our sample, there were 
14 AML subtypes available according to the 3rd edition of 
the International Classification of Diseases for Oncology 
(ICD-0-3) [27]. The AML subtypes were the following: 
9840/3 - acute erythroid leukemia; 9861/3 - AML, NOS; 
9866/3 - acute promyelocytic leukemia (AML with t (15;17) 
(q22; q12)) PML/RARA; 9867/3 - acute myelomonocytic 
leukemia; 9871/3 - AMLwith inv (16)(p13.1q22) ort (16;16) 
(p13.1;q22), CBFB-MYH11; 9872/3 - AML with minimal 
differentiation; 9873/3 - AML without maturation; 9874/3 
- AML with maturation; 9895 /3 - AML with myelodysplasia- 
related changes; 9896/3 - AML, t (8;21)(q22;q22) RUNX1- 
RUNX1T1; 9897/3 - AML with t (9311) (p22;q23), MLLT3- 
MLL; 9898/3 - AML with Down Syndrome; 9910/3 - acute 
megakaryoblastic leukemia; and 9920/3 - therapy related 
myeloid neoplasm. A vast majority of patients (93 percent) 
had unknown AML grade. Thus, we excluded this variable 
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from our analysis. 


Statistical Methods 


Descriptive Analysis: We performed descriptive analyses 
for the predictors stratified by their classes. The correlation 
was tested among all predictors with Pearson’s correlation 
coefficient. 

Reductive Analysis: We employed machine learning to 
predict the determinants of five-year survival to AML. We 
applied four commonly used supervised machine learning 
algorithms in cancer research - logistic regression, support 
vector machine, K neighbor classification, and gradient 
boosting - to understand which algorithm provides higher 
accuracy of prediction. We ran the best-fitting model for each 
algorithm to derive the predictions. The best-fit was derived 
through optimization techniques as described under each 
algorithm below. 

Logistic Regression (LR): Logistic regression is used for 
classification problems, i.e. binary or categorical output. The 
algorithm fits the best model to describe the relationship 
between the output and input (predictor) variables [28]. 
We used the grid search function to identify the best fit 
parameters, which were L2 regularization and a penalty 
strength of 1. 

Support Vector Machine (SVM): The data is classified into 
two classes in support vector machine (SVM) based on the 
output variable over a hyperplane [23]. The algorithm tries to 
maximize the distance between the hyperplane and the two 
closest data points from each class. There are three critical 
parameters in SVM - kernel (transforms data into a spatial 
form such as linear, radial, sigmoid, or polynomial), penalty 
(an error term, also called regularization) and gamma (a 
measure of model fitting). Using grid search feature for 
optimization, the best parameters in our model for kernel, 
penalty and gamma were radial, 1 and 0.1 respectively. 

K Nearest Neighbors (KNN): The class of a new observation 
is decided by the majority class among its neighbors in KNN 
algorithm [29]. There are three important parameters for 
KNN - number of nearest neighbors, distance metric and 
weights. Number of nearest neighbors refers to the number of 
data points a new observation is assigned to. Distance metric 
is a measure of the distance between the new observation 
and the nearest neighbors. There are three possible distance 
metrics - Euclidean, Manhattan and Minkowski. Weight 
is a measure to test the contribution of the members in 
the neighborhood. The members can be weighted equally 
(uniform weight) or higher weights for nearest members 
(distance weight). Using grid search feature for optimization, 
the best parameters in our model were 15 nearest neighbors, 
Manhattan metric and uniform weights. 

Gradient Boosting: Gradient boosting is an algorithm 
that uses a combination of shallow and successive decision 
trees [30]. Decision trees consist of recursively partitioning 
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(also known as splitting) of the predictors. Each decision 
tree learns successively and improves on the previous 
(learning rate). One must define the maximum depth for 
each decision tree (number of levels up to which splitting 
continues) and minimum leaf sample to split (minimum 
number of observations required in a node to be considered 
for splitting). Eventually, predictions are based on a weighted 
combination of these trees. We used grid search feature to 
optimize model parameters. The best fit parameters were 
- 80 decision trees, maximum depth of three for each tree, 
minimum leaf samples of seven to split, three maximum 
features and 0.15 learning rate. 

Evaluation ofthe performance ofthe algorithms: The data 
was split into training (80 percent) and test segments (20 
percent) for all algorithms. First, the algorithms were trained 
on the training segment and then were validated on the test 
segment for determining predictions. The data was 10-fold 
cross-validated with the data split into 80% training and 20% 
test observations randomly ten times for all algorithms. The 
average of the cross-validations was taken as the final result. 
The models were evaluated with accuracy (correct prediction 
of survived patients as survived and non-survived patients 
as non-survived), precision (ratio of correctly predicted 
survived patients to the total predicted survived patients), 
recall (ratio of correctly predicted survived patients to the all 
patients), F1 score (weighted average of precision and recall), 
and area under the receiver operating characteristics curve 
(AUC) [31]. A receiver operator characteristic (ROC) curve 
presents a plot of the true positive rate (y-axis) against the 
false positive rate (x-axis) for each individual algorithm. AUC 
measures the area under the ROC curve, and it ranges from 
0.50 to 1.0 where 0.50 indicates the lowest discriminating 
score and 1.0 indicates the highest discriminating score. 


The statistical analyses were performed using Python 
programming language Version 3.7 (Python Software 
Foundation, Wilmington, DE, USA) and the deep neural 
network was implemented on the Tensor flow platform [32]. 
The web application was built using the Shiny package for R 
and deployed with Shiny server (R Foundation for Statistical 
Computing, Vienna, Austria). 


Results 


In this section, we present the profile of patients, 
performance of the algorithms and our online survival 
prediction tool. 


Patient Profile 


The demographic profile of the patients is presented 
in Table 1. The mean age of the patients was 6.1 years with 
a standard deviation of 5. Slightly above half were males 
(52.9%). Among various races, non-Hispanic whites were 
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the majority (43.4%) followed by Hispanics (31.8%) and 
non-Hispanic blacks (13.7%). Out of all AML subtypes, 
patients with AML not otherwise specified (NOS) were the 
majority group (39.2%). Closer to 60% of the patients in our 
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sample had a survival of five or more years. The correlation 
coefficients between the predictors ranged from -0.14 to 
0.02. 


Variable Number Proportion (%) 
Age (years) 6.1* 5.0° 
Sex 
Female 696 47.1 
Male 781 52.9 
Race 

Hispanic 469 31.8 

Non-Hispanic American Indian/Alaska native 21 1.4 

Non-Hispanic Asian or Pacific Islander 137 9.3 

Non-Hispanic black 203 13.7 

Non-Hispanic unknown 6 0.4 

Non-Hispanic white 641 43.4 

AML subtype 

9840/3 - acute erythroid leukemia 30 2 

9861/3 - AML, NOS 579 39.2 

9866/3 - acute promyelocytic leukemia (AML with t (15;17) (q22; 145 98 

q12)) PML/RARA 

9867/3 - acute myelomonocytic leukemia 131 8.9 

9871/3 - AML with inv (16) (p13.1q22) or t (16;16) (p13.1; q22), 30 2 
CBFB-MYH11 

9872/3 - AML with minimal differentiation 71 4.8 

9873/3 - AML without maturation 72 4.9 

9874/3 - AML with maturation 114 7.7 

9895/3 - AML with myelodysplasia-related changes 20 1.4 
9896/3 - AML, t (8;21) (q22; q22) RUNX1-RUNX1T1 44 3 

9897/3 - AML with t (9;11) (p22; q23), MLLT3-MLL 25 1.7 
9898/3 - AML with Down Syndrome 15 1 

9910/3 - acute megakaryoblastic leukemia 169 11.4 

9920/3 - therapy related myeloid neoplasm 32 2.2 

Five-year survival 
Yes 876 59.3 
No 601 40.7 


# Mean; * Standard deviation 
Table 1: Patient profile. 


Performance of the Algorithms 


The performance metrics of the algorithms (logistic 
regression, support vector machine, K nearest neighbor, 


and gradient boosting) are shown in Table 2. The accuracy 
of gradient boosting was the highest (0.681) followed by 
KNN (0.635), SVM (0.618), and logistic regression (0.588). 
F1-score (harmonic mean of precision and recall) was the 
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highest for the gradient boosting (0.692), followed by SVM 
(0.672), logistic regression (0.664), and KNN (0.663). Area 
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from 0.561 to 0.726 with the highest score for the gradient 
boosting algorithm. Considering all the performance metrics, 


under receiver operating characteristic curve (AUC) ranged gradient boosting was the best performer. 


Metrics 
Accuracy (SD") 


Logistic regression 
0.588 (0.077) 


Support vector machine 
0.618 (0.047) 


K nearest neighbor 
0.635 (0.037) 


Gradient boosting 
0.681 (0.047) 


Precision 0.721 0.713 0.688 0.692 
Recall 0.615 0.636 0.639 0.692 
F1-score 0.664 0.672 0.663 0.692 
AUC 0.635 0.561 0.665 0.726 


*Standard deviation of 10-fold cross-validated accuracy 
Table 2: Performance metrics of the algorithms. 


Online Survival Prediction Tool-OSPAM-C 


The best performing model, gradient boosting was 
deployed as the online survival prediction tool named as 
“Online Survival Prediction tool for Acute Myeloid Leukemia 
in children” - “OSPAM-C” (https://ashis-das.shinyapps.io/ 
ospam/). As shown in figure 1, the user interface has four 
boxes to select input features as drop-down menus. The 
features are age (fourteen options - 0 through 14 years), sex 
(two options - male and female), race (six options - Hispanic, 


non-Hispanic American Indian/Alaska native, non-Hispanic 
Asian or Pacific Islander, non-Hispanic Black, non-Hispanic 
white and unknown) and AML sub-type (seventeen options 
according to the 3rd edition of the ICD-O-3 and WHO 2008 
definitions). A user has to select one option each from the 
feature boxes and click the submit button to estimate the 
five-year survival probability in percentages. For instance, 
the tool gives a five-year survival prediction of 57.4 % fora 
12-year old female Hispanic patient suffering from AML with 
maturation (9874/3). 


y 


nv Female 7 Hispanic 


Prediction: 


Probability of 5-year Survival: 57.4% 


Welcome to the Online Survival Prediction tool for Acute Myeloid Leukemia in 


Children (OSPAM-C) 


Instructions: Select the input values from the drop-down menu in the boxes. Then, click the Submit button for predictions. 


Figure 1: OSPAM-C online survival prediction tool for pediatric AML patients. 


AML sub-type 


9874/3 - AML with maturation > Submit 


Discussion 


In this study, we utilized machine learning algorithms 
to predict five-year survival among pediatric AML patients. 
Among all our algorithms, gradient boosting performed the 
best and was deployed as an online survival prediction tool 
for pediatric AML named OSPAM-C. 


Acute myeloid leukemia is one of the most common 
malignancies among children. While the overall survival has 
improved for children in recent times, it still has one of the 
worst survival probabilities among the leading pediatric 
cancers. AML is also a heterogenous condition with several 
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biological, clinical and genetic factors influencing treatment 
response and prognosis [33]. While few have explored 
the predictors of AML survival among children applying 
conventional analytic methods on SEER database, none have 
applied machine learning yet [7,34,35]. 


There are a few predictive web applications to estimate 
survival for other cancers from SEER database such as 
chondrosarcoma, spinal chordoma, and _ glioblastoma 
[21,36,37]. However, we believe this is the first web-based 
survival prediction model for pediatric AML patients. 
Using SEER database, Thio, et al. [21] and Karhade, et al. 
[36] applied machine learning algorithms respectively to 
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1,554 chondrosarcoma and 265 spinal chordoma patients 
to predict five-year survival. They utilized decision tree, 
support vector machine, Bayes point machine and neural 
networks. Among their algorithms in both studies, Bayes 
point machine was the best performer that was deployed 
for the web application. Similarly, Senders et al applied 15 
machine learning and statistical algorithms-accelerated 
failure time (AFT), bagged decision trees, boosted decision 
trees, boosted decision trees survival, Cox proportional 
hazards regression (CPHR), extreme boosted decision 
trees, k-nearest neighbors, generalized linear models, lasso 
and elastic-net regularized generalized linear models, 
multilayer perceptron, naive Bayes, random forests, random 
forest survival, recursive partitioning, and support vector 
machines [37]. The AFT algorithm was deployed as the 
online prediction tool. The C-statistics (AUC) were 0.868, 0.8 
and 0.7 respectively for chondrosarcoma, spinal chordoma, 
and glioblastoma predictions with their best performing 
models, whereas it was 0.726 in our best performing model. 


Our study has several potential limitations. First, as we 
used SEER data, there were certain missing clinical features 
such as treatment type, response to initial therapy, stage 
and extent of disease. Moreover, due to unavailability of 
meaningful responses, we had to drop the grade of AML. 
Second, the database does not collect information on key 
socio-demographic features such as geographic location, 
household education and economic status. Third, there 
was no information in the database on molecular biology, 
genomics, proteomics, or metabolomics factors. All these 
additional clinical and socio-demographic factors are known 
to influence survival in AML patients. Inclusion of these 
additional features may improve the accuracy and reliability 
of the model. 


Our survival prediction tool is the first of its kind for 
pediatric AML. Although we used data from the largest cancer 
database in the US, the tool is yet to be validated. Therefore, 
we advise caution for clinicians and patients who intend to 
use this tool as a predictive guide for ascertaining survival 
for pediatric AML patients. Clinical experts must balance the 
predictions from this tool against their clinical experience, 
genomics and other relevant clinical information. We hope 
this tool will further be validated and possibly reoptimized 
using heterogenous data from various cohorts in multiple 
practice settings. While external validation is needed, our 
survival prediction tool presents an opportunity to inform 
clinical decision-making for AML patients. 
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